Semi-supervised Statistical Inference for Business Entities Extraction and Business Relations Discovery
نویسندگان
چکیده
The sheer volume of user-contributed data on the Internet has motivated organizations to explore the collective business intelligence (BI) for improving business decisions making. One common problem for BI extraction is to accurately identify the entities being referred to in user-contributed comments. Although named entity recognition (NER) tools are available to identify basic entities in texts, there are still challenging research problems such as co-reference resolution and the identification of abbreviations of organization names. The main contribution of this paper is the illustration of a novel semi-supervised method for the identification of business entities (e.g., companies), and hence to automatically construct business networks. Based on the automatically discovered business networks, financial analysts can then predict the business prestige of companies for better financial investment decision making. Initial experiments show that the proposed NER method for business entity identification is more effective than other baseline methods. Moreover, the proposed semi-supervised business relationship extraction method is more effective than the state-ofthe-art supervised machine learning classifiers when there are not many training examples available. Our research work contributes to advance the computational methods for the extraction of entities and their relationships from texts.
منابع مشابه
Semi-Supervised Text Mining For Dynamic Business Network Discovery
Recently, much research effort has been devoted to the discovery and analysis of online social networks. However, relatively little research has been done for business network discovery and analysis. Although named entity recognition (NER) tools are available to identify basic entities in texts, there are still challenging research problems, such as co-reference resolution and the identificatio...
متن کاملData Analysis Project: Semi-Supervised Discovery of Named Entities and Relations from the Web
This project studies semi-supervised discovery of named entities, relational entities and prepositional phrase attachments within a read-the-web framework. Meanings of an entity can be improvised and updated faster in the internet world than printed references. The main idea of this project is to study the feasibility of characterizing entities by web content directly. The approach is that cont...
متن کاملA New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کاملSimultaneous Identification of Biomedical Named-Entity and Functional Relations Using Statistical Parsing Techniques
In this paper we propose a statistical parsing technique that simultaneously identifies biomedical named-entities (NEs) and extracts subcellular localization relations for bacterial proteins from the text in MEDLINE articles. We build a parser that derives both syntactic and domain-dependent semantic information and achieves an F-score of 48.4% for the relation extraction task. We then propose ...
متن کاملSimultaneous Identification of Biomedical Named-Entity and Functional Relation Using Statistical Parsing Techniques
In this paper we propose a statistical parsing technique that simultaneously identifies biomedical named-entities (NEs) and extracts subcellular localization relations for bacterial proteins from the text in MEDLINE articles. We build a parser that derives both syntactic and domain-dependent semantic information and achieves an F-score of 48.4% for the relation extraction task. We then propose ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011